The requiring of large amounts of annotated training data has become a commonconstraint on various deep learning systems. In this paper, we propose a weaklysupervised scene text detection method (WeText) that trains robust and accuratescene text detection models by learning from unannotated or weakly annotateddata. With a "light" supervised model trained on a small fully annotateddataset, we explore semi-supervised and weakly supervised learning on a largeunannotated dataset and a large weakly annotated dataset, respectively. For theunsupervised learning, the light supervised model is applied to the unannotateddataset to search for more character training samples, which are furthercombined with the small annotated dataset to retrain a superior characterdetection model. For the weakly supervised learning, the character searching isguided by high-level annotations of words/text lines that are widely availableand also much easier to prepare. In addition, we design an unified scenecharacter detector by adapting regression based deep networks, which greatlyrelieves the error accumulation issue that widely exists in most traditionalapproaches. Extensive experiments across different unannotated and weaklyannotated datasets show that the scene text detection performance can beclearly boosted under both scenarios, where the weakly supervised learning canachieve the state-of-the-art performance by using only 229 fully annotatedscene text images.
展开▼